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ABSTRACT 

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information 
from students' responses to evaluate the quality of test items. Difficulty Index (p-volue). Discrimination Index (Dl) and 
Distroctor Efficiency (DE) ore the parameters which help to evaluate the quality of MCQs used in on examination. This 
study has been postulated to investigate the relationship of items having good p-volue and Dl with their DE and their utility 
to frame 'ideal questions'. This study further evaluates the MCQs os a tool of assessment so os to improve the curricula in 
Medical Education. In this study, 20 test items of 'Type A'MCQ tests of assessment were selected. The p-volues, Dl and DE 
were estimated. The relationship between the p-volue and Dl for each test item was determined by Pearson correlation 
analysis. Mean p-volue and Dl of the test were 66.53 ± 16.82 % and 0.41 ± 0.16% respectively. Only 20% of total test 
items crossed the p-volue of 80% indicative of their easy difficulty level. 95% of the test items showed acceptable (>0.2) 
Dl. 12 out of 20 test items showed excellent Dl (>0.4). 8 (40%) test items were regarded os 'ideal' having p-volue from 30- 
70, and Dl > 0.24. Correlation studies revealed that, Dl associated with p-volue (r = -0.288; P = 0.219). Mean DE of the 
test was 76.25 ± 22.18%. The DE was directly related to the Dl. Items with good and excellent Dl hod DE of 66.67 ± 
14.43% and 83.33 ± 19.46% respectively. In conclusion, on acceptable level of test difficulty and discrimination was 
maintained in the type A MCQ test. The test items with excellent discrimination tend to be in the moderately difficult 
range. There was a consistent spread of difficulty in type A MCQ items used for the test. Much more of these kinds of 
analysis should be carried out otter each examination to identify the areas of potential weakness in the type A MCQ tests 
to improve the standard of assessment. 
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INTRODUCTION 

In multi-disciplinary integrated curriculum like medical 
education. Multiple Choice Questions (MCQs) are used 
mostly for comprehensive assessment at the end of a 
semester or academic sessions and provide feedback to 
the educators on their academic performance. Scheming 
MCQs is a multifarious and time consuming process as 
compared to the descriptive questions. After the 
assessment a medical academician needs to know the 
effectiveness of the test questions in resounding students' 
learning related performance in the course. Because of 
versatility in the assessment MCQs are one of the best and 


commonly used assessment tool to gauge the knowledge 
competencies of medical students. Appropriately, 
constructed MCQs evaluate higher-order cognitive 
processing of Bloom's taxonomy such as interpretation, 
synthesis and application of knowledge, more than just 
testing recall of isolated facts [1 ], [2]. 

Among the different types of MCQs used in the medical 
field, the most frequently used type is the single best- 
response type (lype A MCQ) with four choices [3]. These test 
questions were taken from the subject of Biochemistry. The 
examination questions had been formulated by the 
content experts who taught the respective syllabi and 
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scrutinized by the senior academicians of the department. 
Item analysis is the process of assembling, summarizing 
and using information from students' responses to assess 
the qualify of test items [4]. The item statistics can help to 
determine effective items and those that need 
improvement or omission from the question bank. It allows 
any aberrant items to be given attention and revised. One 
of the most widely used methods in investigating the 
reliability of a test item has been Classical Test theory (CT) 
item analysis [4], [5]. This type of item analysis essentially 
determines test homogeneity. The more similar are the 
items given in the test; the more likely they measure the 
same kind of intended aptitude and therefore attaining 
higher reliability. 

In CT, item difficulty index (p-value), also called as "ease 
index" is the first item characteristic to be determined [5]. It 
is described as the percentage of the total group of 
students selecting the correct answer to that question. It 
ranges from 0 - 100%. The higher the percentage, the 
easier the item. The recommended range of difficulty is 
from 30 - 70%. Items with p-values <30% and above 70% 
are considered difficult and easy items respectively [2]. 


p value = 


Frequency of marking correct key 
Total No. of students 


It is very obligatory, as the reliability of the tests to measure 
students' performance are often questioned due to 
nonconformity of item difficulty with the ability of the 
students. Very easy items should usually be placed either at 
the start of the test as 'warm-up' questions or removed 
altogether. The difficult items should be reviewed for 
possible perplexing language, areas of disagreement, or 
even an inappropriate key. Inclusion of very difficult items in 
the test depends upon the target of the teacher, who may 
want to include them in order to identify top scorers. 

Along with difficulty index, item Discrimination Index (Dl), 
also called as "point biserial correlation" is another 
important guide [4]. This provides information on the 
efficacy of the items in a given test to discriminate between 
students with higher and lower abilities [6]. 


Dl = 2X 


M — L 


N 


where, H and L are the number of correct responses in the 
high and low groups respectively. N is the total number of 
students in both high and low groups. 

It ranges between -1.00 and +1.00. It is expected that, the 
high-scorers select the correct answer for each item more 
often than the low scorers. If this is true, the assessment is 
said to have a positive Dl (between 0.00 and +1.00), 
indicating the total high scorers, chose the correct answer 
for a specific item more often than the overall low scorers. If, 
nevertheless, the low scorers got a specific item correct 
more often than the high scorers, then that item has a 
negative Dl (between -1.00 and 0.00). Culpabilities in 
structuring test items logically affects the values of 
discrimination index. Items with poor discrimination ability 
should be inspected for potential deficiencies [5]. 

The difficulty and discrimination indices are associated 
reciprocally. However, this may not always be true. 
Questions having high p-value (easier questions), 
discriminate poorly; conversely, questions with a low p- 
value (harder questions) are considered to be good 
discriminators [7]. 

Another convincing technique is, analysis of distractors that 
provides information regarding the individual distractors 
and the key of a test item. Using these tools, the examiner is 
able to modify or remove specific items from subsequent 
exams [1 ]. The distractors are important components of an 
item, as they show a relationship between the total test 
score and the distractor chosen by the student. Students' 
performance depends upon how the distractors are 
designed [8]. Distractor Efficiency (DE) is one such tool that 
tells us whether the item was well constructed or failed to 
perform its purpose. Any distractor that has been selected 
by less than 5% of the students is considered to be a non¬ 
functioning distractor (NF-D) [1 ]. Distractor efficiency ranges 
from 0 - 100% and is determined on the basis of the 
number of NF-Ds in an item. Four NF-Ds: DE = 0%; 3 NF-Ds: 
DE = 25%; 2 NF-Ds: DE = 50%; 1 NF-D: DE - 75%; No NF-D: 
DE = 100%. Ideally, low-scoring students, who have not 
mastered the subject, should choose the distractors more 
often, whereas, high scorers should discard them more 
frequently while choosing the correct option. By analysing 
the distractors, it becomes easier to identify their errors, so 
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that they may be revised, replaced, or removed. 

Tarrent and Ware confirmed that, flawed MCQ items 
affected the performance of high-achieving students 
more than borderline students [9]. Construction of a 
balanced MCQ, therefore, addresses the concerns of the 
students of getting an acceptable average grade and 
that of the faculty to have an appropriate spread of the 
score [10]. 

Hence, the present research study was envisioned with an 
objective to analyze the quality of the MCQs (type A) used 
in the assessments of first year medical students in the 
preclinical phase and to test the quality of framed MCQs 
for the subsequent tests. The authors have aimed to find out 
the relationship between the item difficulty and item 
discrimination indices of these MCQs with their distractor 
efficiency and the effect of non-functioning distractors on 
these indices. 

Materials and Methods 
Data Collection 

MCQs (given in Appendix) were taken from the assessment 
test papers from the years 2010-2014 (each year having 
one cohort). Each of these examinations was held during 
the first six months of the preclinical phase and the test 
paper was based only upon the syllabus assigned for the 
examination. A total of 20 test items were selected for the 
item analysis. Each type A MCQ consisted of a stem and 
four choices and the students were to select one best 
answer from these four choices. A correct answer was 
awarded 1 /2 mark and there were no negative marks for 
the incorrect answer. 

Item Analysis 

The result of the examinees' performance in the test was 
used to investigate the p-value, Dl and DE of each MCQ 
item. The p-value of an item is calculated as the 
percentage of the total number of correct responses to the 
test item [11], [12]. It is calculated using the formula, 

P=R/T 

where P is the item difficulty index, R is the number of correct 
responses and T is the total number of responses (which 
includes both correct and incorrect responses). An item is 
considered difficult and easy when the difficulty index 


value is < 30% and > 70% respectively. 

The item Dl measures the difference between the 
percentages of students in the upper group with that of the 
lower group who obtained the correct responses. At first, 
top and bottom 27% of the total number (n) of students 
were counted [12], [13]. The total number of students who 
obtained the correct response in the Upper Group 27% 
(UG) and the Lower Group 27% (LG) was counted. The 
respective percentage of the number of students in upper 
group (PU) and lower group (PL) is calculated. The 
discrimination index was calculated using the formula, Dl = 
PU - PL. The higher the discrimination index, better is the 
ability of the test item to discriminate between students with 
higher and lower test scores. Based on Ebel's (1972) 
guidelines on CT item analysis, items were categorized 
depending on their discrimination indices [4], [14]. The item 
with negative Discrimination Index (Dl) was considered to 
be discarded; Dl: 0.0 - 0.19: poor item: to be revised; Dl: 
0.20 - 0.29: acceptable; Dl: 0.30 - 0.39: good;DI: > 0.40: 
excellent. 

Stotisticol Analysis 

All data were expressed as mean ± SD of the total number 
of items. The relationship between the item difficulty index 
and discrimination index for each test item was 
determined by Pearson correlation analysis using IBM SPSS 
22. The correlation is considered significant at 0.01 level (2- 
tailed). 

Results 

A total of 180 students appeared for the test consisting of 
20 lype A MCQs (single-best response MCQs). The mean 
score achieved was 6.65 ± 1.64 (maximum 10 marks). 
Mean scores according to groups were: lower4.41 ± 0.73; 
middle 6.48 ± 0.75; upper 8.56 ± 0.56. Students were 
ranked in the order of merit from the highest score of 9.5 to 
the lowest score of 2. 


Table 1 shows the categorization of the first 27% students 



Score 

Mean 

SD 

Above 63% 

8.56 

0.56 

27- 63% 

6.48 

0.75 

Below 27% 

4.41 

0.73 

Total 

6.65 

1.64 


Table 1. Scores of Upper and Lower Performance Groups in the Test 
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MCQ 

p-value 

Dl 

DE (%) 

p-value 

Interpretation 

p-value 

DE (%) 

No of items 







(Mean ±SD) 

(Mean ±SD) 


1 

82.78 

0.42 

50 






2 

73.33 

0.51 

50 

> 70 

Easy 

80.44 ±5.57 

62.50 ±21.25 

10(50%) 

3 

81.11 

0.38 

50 

30 -70 

Average 

55.43 ±7.97 

91.67 ±12.50 

9 (45%) 





< 30 

Difficult 

27.22 ±0.00 

75.00 ±0.00 

1 (5%) 

4 

78.89 

0.42 

75 






5 

65.00 

0.54 

75 

Total 


66.53 ±16.82 

76.25 ±22.18 

20 (100%) 

6 

94.44 

0.07 

25 

Table 3. Distribution of Difficulty Index (p-value) level in the 

7 

27.22 

0.21 

75 


tests and their Interpretation 


8 

78.89 

0.43 

75 










Dl 

Interpretation 

Dl 

DE (%) 

No of items 

9 

58.89 

0.24 

100 



(Mean ±SD) 

(Mean ±SD) 


10 

64.44 

0.56 

100 

- 0.40 

Excellent 

0.52±0.09 

83.33±19.46 

12 (60%) 

11 

52.78 

0.63 

100 










0.30 -0.39 

Good 

0.35±0.04 

66.67± 14.43 

3(15%) 

12 

78.33 

0.23 

50 






13 

53.89 

0.62 

100 

0.20 -0.29 

Marginal 

0.23±0.02 

75.00±20.41 

4 (20%) 

14 

81.11 

0.25 

75 

< 0.20 

Poor 

0.07±0.00 

25.00±0.00 

1 (5%) 

15 

76.67 

0.43 

100 

Total 


0.41 ±0.16 

76.25±22.18 

20 (100%) 

16 

78.89 

0.35 

75 






17 

50.56 

0.31 

75 

Table 4. Distribution of Discrimination Index (Dl) in the 

18 

42.78 

0.62 

100 


tests and their Interpretation 


19 

47.22 

0.41 

75 






20 

63.33 

0.64 

100 

0.70 


0.63 

0.62 

o.« 0M 


Table 2. Distribution of Difficulty Index (p-value), Discrimination Index 
(Dl) and Distractor Efficiency (DE) of 20 MCQs in the tests 


100 

00 u.n 


5 70 

I 60 

x 

I SO 
£ 40 

I 30 

O 20 
10 
0 


HM 

I 





0.54 




1 2 3 4 S 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 

MCQ Number 

Figure 1. Difficulty Indices (p-Value) of MCQS 

(UG) and the last 27% (LG). The distribution of p-value and Dl 
of the 20 MCQs is depicted in Table 2. The mean p-value of 
the test was found to be 66.53 ± 16.82% which indicates 
relatively easy test paper. Only 20% items (1 /5th of 20 test 
items) in this study crossed the p-value of 80% as shown in 
Table 3 and Figure 1. 

The mean Dl of the test was 0.41 ± 0.16 demonstrating the 
acceptable discrimination quality. 19 (95%) test items were 
found with Dl >0.2 discriminating good and weak students. 
12 (60%) items showed excellent Dl > 0.4 as depicted in 
Table 4 and Figures 2 and3 depict DE of MCQs. 

Figure 4 (a) and 4 (b) reveals the correlation between 


1 2 3 4 S 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 

MCQ Number 

Figure 2. Discrimination Indices of MCQS 

100 100 100 100 100 100 100 



Figure 3.Distractor Efficiency of MCQS 

individual item's p-value and Dl score. 40% of the test items 
with Dl< 0.4 had the p-value ranging between 27 - 95%. 
Out of 12 items with excellent Dl (> 0.4), 58.33% had the p- 
value between 50 - 80%. Correlation study between p- 
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Figure 4. (a). Correlation of Difficulty Index with Discrimination, 
(b). Correlation of Difficulty Index with Discrimination (with trend) 


p-value 

Dl = 0.30 

DE (%) 
(Mean ±SD) 

Dl = 0.30 

DE (%) 
(Mean ±SD) 

No of items 

> 70 

3(15%) 

50.00 ±25.00 

7 (35%) 

67.86 ±18.90 

10(50%) 

30-70 

1 (5%) 

100.0 ±00.00 

8 (40%) 

90.63 ±12.94 

9 (45%) 

< 30 

1 (5%) 

75.00 ±0.00 

0 (0%) 

0.00 ±0.00 

1 (5%) 

Total 

5 (25%) 

65.00 ±28.50 

15(75%) 

80.00 ±19.36 

20 (100%) 


Table 5. Categorization of Difficulty Index (p-value) and 
Discrimination Index (Dl) of 20 MCQs in the tests 


value and Dl showed that Dl correlated negatively with p- 
value(r=-0.288; P= 0.219), but it was statistically insignificant. 
Negative correlation signifies that, with increasing p-values, 
there is decrease in Dl. As the items get easy (above 75%), 
the level of Dl decreases consistently. 

Corresponding DE of 20 MCQs was also worked out, details 
of which are given in Tables 2 and 6. Two items having 
lowermost DIs, with item 6 being the easiest one (p = 94.44; 
Dl = 0.07; DE = 25), and item 7, most difficult (p - 27.22; Dl 
= 0.21; DE = 75). 45% items were of average 
(recommended) difficulty with a mean p-value of 55.43 ± 
7.97%. Similarly, majority of items (60%) had excellent Dl 
(0.52 ± 0.09), with few items with marginal (20%) and poor 
(5%) Dl. This revealed that, 8 (40%) items were 'ideal' having 
a p-value from 30 - 70, and Dl > 0.24, as evident from 
Table 5. The total number of distractors were 80 (4 per item). 


Out of the 80 distractors, 19 (23.75%) were NF-Ds. 13 (65%) 
items had NF-Ds, while 7 (35%) items had effective distracters. 

Table 6 explains that, items with 3 NF-Ds had high p-value 
( 94 . 44 %) and poor Dl (0.07), whereas items with 2,1 and no 
NF-Ds had recommended p-values 78.89 ± 4.13,63.47 ± 
19.90 and 58.97 ± 10.71, and excellent Dl 0.39 ± 0.12, 
0.37 ±0.11 and 0.53 ±0.15 respectively. 

Figure 5 shows that, DE varies indirectly with the p-value, with 
most difficult items having DE of 75.00% and easy items 
having DE 62.50 ± 21.25%. Items with average difficulty 
had DE of 91.67 ± 12.50%. Figure 6 reveals that, DE is 
directly related to the Dl. Items with good and excellent Dl 
had DE of 66.67 ± 14.43% and 83.33 ± 19.46% 
respectively. 


DE (%) 

No of NF-Ds 

p-value 
(Mean ±SD) 

Dl 

(Mean ±SD) 

No of items 

25 

3 

94.44 ±0.00 

0.07 ±0.00 

1 (5%) 

50 

2 

78.89 ±4.13 

0.39 ±0.12 

4 (20%) 

75 

1 

63.47 ±19.90 

0.37 ±0.11 

8 (40%) 

100 

None 

58.97 ±10.71 

0.53 ±0.15 

7 (35%) 

Total 


66.53 ±16.82 

0.41 ±0.16 

20(100%) 


Table 6. Distribution of Distractor Efficiency (DE) in the 
tests and Non-functional Distractors (NF-Ds) 


IX 



1 2 3 4 S 6 7 8 9 10 11 12 U 14 IS 16 17 18 19 

MCQ Numb* 


Figure 5. Correlation of Difficulty Index with Distractar Efficiency 



IX) 

100 


1 2 3 4 S 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 

MCQ Numb#* 

Figure 6. Correlation of Discrimination Index with Distractor 
Efficiency 
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Distractor analysis gives an opportunity to study the 
responses made by the students on each alternative of the 
item. The analysis of 5 questions selected on the basis of p- 
value and Dl gave a varied result. 

The MCQ No. 6 was a very easy question (p-value = 
94 . 44 %) with lowest Dl (0.07), as both upper and lower 
groups selected, nearly equally, the correct answer. The DE 
was 25% and 94.44% students selected the correct 
response, making the rest of the alternatives useless. 

The MCQ No. 7 was selected for its lowest p-value (27.22%) 
making it the most difficult item with more of the higher 
group choosing the correct response, but paradoxically 
the Dl was poor (0.21). The DE was 75% as the distractor 
alternative'd' served no purpose at all, as hardly 5 students 
selected it, and distractor alternative 'b' was so effective 
that 88 students selected it, made it most difficult to choose 
even from the reduced number of choices. Distractor 
alternative 'b 1 was considered the right answer by many 
students both in the upper 8c lower groups and needs to be 
revised to discriminate properly from the correct choice. 

MCQ No. 13 had a p-value of 53.89, Dl of 0.62, and DE of 
100%, showing that, it was moderately difficult and being 
able to differentiate students into different strata. From the 
upper group, 85.19% students selected the correct 
response 'c 1 , while 76.92% students of the lower group were 
distributed among all the distractor choices. 

MCQ No. 18 (Dl = 0.62) was an excellent discriminatory 
and of average difficulty level (p-value = 42.78%), as 80% 
of the students in the upper group chose the correct 
alternative'd' and 82% students in the lower group chose 
the distractors. This question had DE of 100% as the 
distractorswere not clear to many students, making it a very 
difficult item. 

MCQ No. 20 (Dl = 0.64) was another excellent discriminatory 
with DE of 100% and of average difficulty level (p-value = 
63.33%), as 93% of the students in the upper group chose 
the correct alternative 'o' and 28% students in the lower 
group chose the same correct alternative. 

Discussion 

The assessment tool is one of the strategies which should be 
designed according to the objective, to strategize the 


assessment tool. One-best response MCQs, if properly 
written and well-constructed, are one of the strategies of 
the assessment tool that quickly assess any level of 
cognition according to Bloom's taxonomy [1J, 

The mean difficulty index scores of the test was 66.53 ± 
16.82%. Only 50% of the total test items had difficulty index 
scores crossing 70%. This observation was similar to a study 
in a medical school reported by Sim Si-Mui and Rasiah 
(2006), who found that, about 40% of the MCQ items 
crossed difficulty index 70% showing that, the test items 
were easy for the examinees [12]. Brown (1983) and Algina 
(1986) have reported that, any Dl of 0.2 or higher is 
acceptable and the test item would be able to 
differentiate between the weak and good students [15], 
[16]. In the present study, 95% of the MCQs from the test 
had Dl of more than 0.2. Thus it showed that, most of the 
MCQs were good or satisfactory questions which would not 
need any modifications or editing. 12 of the 20 items 
showed Dl equal to or more than 0.4, indicating that, these 
MCQ items were excellent test items for differentiating 
between poor and good performers. 

Negative correlation between difficulty and discrimination 
indices indicated that, with increase in difficulty index, there 
is decrease in discrimination index. As the test items get 
easier, the discrimination index decreases, thus it fails to 
differentiate weak and good students. Sim Si-Mui and 
Rasiah (2006) established that, maximum discrimination 
occurred with difficulty index between 40 - 74% [12]. In the 
present study, 76.9% of the test items with difficulty index 
between 50% and 79% had excellent discrimination index. 

For calculation of the Dl, the method adopted by Kelley 
(1939) was used in which upper and lower 27% performers 
were selected [13]. The only limitation of this test is that it 
cannot be used for a smaller sample size. But in this study, 
the sample size was 180 and hence the observed results 
truly reflect the discriminative power of the test items. One 
inadequacy of only analysing a question in terms of its 
difficulty index is the inability to differentiate between 
students of widely differing abilities. Subjective judgment of 
item difficulty by item writer and the vetting committee 
may allow faulty items to be selected in the item bank. 
Items with poor discrimination index and too low or too high 
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difficulty index should be reviewed by the respective 
content experts [17]. This serves as an effective feedback 
to the respective departments in a medical college about 
the quality control of various tests. When the difficulty index 
is very small, indicating difficult question, it may be that, the 
test item is not taught well or is difficult for the students to 
grasp. It also may indicate that, the topic tested is 
inappropriate at that level for the students [18]. 

In the scatter plot, there is a wide variation in the DIs with 
similar levels of difficulty index below 75%. Guessing 
practices by the students might have caused the wide 
variation in DIs, as the negative marking scheme is 
presently not implemented in MCQ tests. An acceptable 
level of test difficulty and discrimination indices appears to 
be maintained the test. This observation could be due to 
the fact that the test items went through a series of 
screening before being selected for the examinations. The 
quality of test items may be further improved based on 
action taken in reviewing the distractors by the item writer 
based on the calculated discrimination and difficulty index 
values. Few common causes for the poor discrimination 
are ambiguous wording, grey areas of opinion, wrong keys 
and areas of controversy [19]. Items showing poor 
discrimination should be referred back to the content 
experts for revision to improve the standard of these test 
items. It is important to evaluate the test items to see their 
efficiency in assessing the knowledge of the students 
based on the difficulty and discrimination indices of the test 
items. 

A distractor used by less than 5% of students is not an 
effective distractor and should be either replaced or 
corrected as it affects the overall quality of the question. An 
NF-D makes the question easier to answer, thereby 
affecting the assessment of the student. Items having NF-D 
can be carefully reviewed. With some alterations in the 
distractors, can be given as the initial item on the test, as a 
'warm-up' question. Flowever, they would not be able to 
differentiate among students, if that is the purpose. 
Assessment of MCQs by these indices highlights the 
importance of assessment tools for the benefit of both the 
student and the teacher [20]. 

The DE of difficult items in our study was 75% - 100% which 


was expected, as difficult items would require a lot of 
guesswork on the part of the student, thereby using all the 
distractors. The numbers of NF-Ds also affect the 
discriminative power of an item. It is seen that reducing the 
number of distractors from four to three decreases the 
difficulty index, while increasing the Dl and the reliability 
[ 21 ]. 

We observed that, items having all four functioning 
distractors had excellent discriminating ability (Dl = 0.53 ± 
0.15) as compared to items with any number of NF-Ds. This 
contradicts other studies favoring better discrimination by 
three distractors as compared to four [22]. 

It was also observed that, items having good/average 
difficulty index (p-value = 30 - 70) and good/excellent Dl 
(> 0.24), considered to be 'ideal' questions, had DE of 
90.63 ± 12.94%, which is close to items having no NF-D. 

Tarrant and Ware found three-option items performing 
equally well as four-option items and have suggested to 
write three-option items as they require less time to be 
developed [23]. Similar observations were made by 
literature review conducted by Vyas et. al [24]. This can be 
because, writing items with four distractors is a difficult task 
and while writing the fourth distractor, we are mostly trying to 
fill the gap, allowing it to become the weakest distractor. 

Results from this study clearly highlighted the importance of 
item analysis of MCQs. Items having average difficulty and 
high discrimination with functioning distractors should be 
incorporated into future tests to improve the test 
development and review. This would also improve the 
overall test score and properly discriminate among the 
students. 

Conclusion 

There was a consistent spread of difficulty in type A MCQ 
items used for the test. The test items that demonstrated 
excellent discrimination tend to be in the moderately 
difficult range. Items with all functional distractors 
performed best in discriminating among the students. 
Factors other than the difficulty, like the faulty test item 
constructions, are not significant at the test. The results of 
this study should initiate a change in the way MCQ test 
items are selected for any examination and there should 
be a proper assessment strategy as part of the curriculum 
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development in medical education. These kinds of analysis 
should be carried out after each examination to identify 
the areas of potential weakness in the type A MCQ tests to 
improve the standard of assessment. 
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Appendix 

Multiple Choice Questions (MCQs) Marks: 10 

1 . Which of the following is NOT true for hydrops fetalis? 

a) It is a a -thalassemia b) It is an a -thalassemia 

c) Death occurs soon d) All four copies of a-globin 
after birth genes are mutated 

2. The nucleotide present at the 3' end of a t-RNA 
molecule is 

a) uridylate b) cytidylate c) thymidylate d) adenylate 

3. The Michaelis Menten Constant (Km) is 

a) equilibrium constant for the dissociation of El E + P 

b) not changed by the presence of non-competitive 
inhibitor 

c) equal to Vfe Vmax 

d) the substrate concentration at V2 Vmax 

4. All of the following features are seen in Marasmus EXCEPT 

a) fatty liver b) muscle wasting c) growth retardation 
d) anaemia 

5. The quality of protein is assessed by all of the following 

EXCEPT 

a) protein efficiency ratio (PER) b) biological value (BV) 

c) net protein utilization (NPU) d) total number of 

aminoacids present 

6. If the amount of adenine present in a DNA molecule is 
20%, then the amount of guanine will be 

a) 20 b) 30 c) 35 d) 40 

7. Transcription of a gene in eukaryotes 

a) always begins at AUG codon 

b) requires a primer RNA 

c) always reads the DNA strand in 3'->5' direction 

d) does not require local unwinding of two DNA strands 

8. Succinate dehydrogenase is inhibited by malonate as it 

a) is a structural analogue of succinate 

b) is a structural analogue of fumarate 

c) binds to a site other than the active site 

d) brings about a conformational change in the active 
site of the enzyme 

9. The pigment visual purple contains 

a) 11 -cis-retinal b) 11 -cis-retinol 

c) all-trans-retinal d) all-trans-retinol 

10. Which of the following coenzymes is not derived from 
vitamins? 


[24]. R. Vyas, and A. Supe, (2008). "Multiple choice 
questions: a literature review on the optimal number of 
options". Nat. Med. J. India., Vol. 21, pp. 130-133. 


a) Coenzyme A b) FAD c) TPP d) Coenzyme Q 

11. The linkages between the glucose residues of 
glycogen are 

a) a (1 - 4) 8c (3 (1 -4) b) (3 (1 - 4) &a(l - 6) 

c) a(l - 4) 8ca(l-> 6) d) a(l - 4) 8c(3 (1 - 6) 

12. Zein of corn is called an incomplete protein as it lacks 
a) alanine 8c lysine b) leucine 8c aspartic acid 

c) lysine 8c tryptophan d) methionine 8cproline 

13. All of the following forces may play a role in the 
formation of the tertiary structure EXCEPT 

a) hydrogen bond b) disulphide bridges 

c) hydrophobic interactions d) peptide bonds 

14. Which one of the following is glycerophospholipid? 

a) Sphingomyelin b) Cerebroside 

c) Phosphatidyl inositol d) Ganglioside 

15. All of the following statements about prostaglandins 
are true EXCEPT 

a) They are cyclic fatty acids 

b) They are potent biologic effector 

c) They cause uterine contraction 

d) They are synthesized only in the prostate gland 

16. Fetal hemoglobin contains the following chains 

a) 2a 8c 25 b) 2a 8c 2y c) 2a 8c 2(3 d) 2a 8c 2s 

1 7. One of the following is the major storage and 
transport form of fatty acids 

a) Cholesterol b) Triacylglycerol 

c) Phospholipid d) Albumin 

18. Pyridoxal phosphate is required as a coenzyme in all 
of the following EXCEPT 

a) heme synthesis b) transamination 

c) decarboxylation d) glycogen synthesis 

19. One of the following points about micro-filaments is 
NOT true? 

a) They form cytoskeleton with microtubules 

b) They provide support and shape 

c) They form intracellular conducting channels 

d) They are involved in muscle cell contraction 

20. The rate limiting enzyme in heme biosynthesis is 

a) ferrochelatase b) 5-amino levulinic acid 

dehydratase 

c) 5-amino levulinic acid d) uroporphyrinogen 
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